last updated: September 11 2021
Title: Shifts in the spatial patterns of dental disease in patients with low salivary flow are accompanied by increased rates of microbial migration
Target: Journal of Dental Research
Diana M. Proctor1,2, Christof Seiler 3,6, Adam R. Burns1, Samuel Walker4, Tina Jung4, Jonathan Weng4, Shanne Sastiel4, Yoga Rajendran4, Yvonne Kapila4, Meredith E. Millman4, Gary C. Armitage4, Peter M. Loomer5, Susan P. Holmes6, Mark I. Ryder4, David A. Relman1,7,8*
Affiliations
Abstract Low salivary flow, or hyposalivation, is associated with an increased incidence of dental caries and a shift in their location from biting surfaces towards coronal and root surfaces. However, the relationship between salivary flow and periodontal disease is less clear. To identify clinical indicators of low salivary flow – including the spatial pattern of dental and periodontal disease, features of the supra- and subgingival microbiota, and symptoms of dry mouth – we enrolled individuals into two cohorts. The low flow cohort (N = 32) consisted of individuals with a presumptive diagnosis of the autoimmune disorder Sjögren’s Syndrome (SS) while the control cohort (N = 119) consisted of healthy controls. We constructed a series of tooth-specific linear models to quantify the extent to which patient cohort, age, and unstimulated whole salivary flow rate (UWS-FR), independent of each other, are associated with dental and periodontal disease at each tooth. While age and a diagnosis of SS correlated with the site-specific increment of disease so too did UWS-FR. Not only were lower UWS-FRs associated with a greater number of decayed, missing, or filled surfaces at 21 teeth, but they were also associated with increased recession, as measured by clinical attachment loss (CAL), at 10 teeth. In addition, we examined microbiota community structure at different tooth sites using data from 427 subgingival and supragingival samples of 6 individuals and found that microbial dispersal is reduced in patients with low salivary flow, but only at supragingival and not at subgingival sites. Finally, we found that complaints by subjects of a negative impact on overall quality of life were associated with a UWS-FR less than 0.1 mL/min. Overall, our results suggest that novel predictors of hyposalivation can be identified by integrating clinical, microbial, and patient history data.
Nice tutorials
#load required packages
library("easypackages")
#define packages
libraries("tidyverse",
"phyloseq",
"genefilter",
"plyr",
"knitr",
"modelr",
"gridExtra",
"reshape2",
"purrr",
"stringr",
"ggrepel",
"kableExtra",
"tidyverse",
"cowplot",
"broom",
"ggplot2",
"RColorBrewer",
"viridis",
"ggpubr",
"dplyr",
"ggpmisc",
"ggfortify",
"FactoMineR",
"factoextra",
"colorspace",
"cowplot",
"png",
"sjPlot",
"bestNormalize",
"lmtest",
"effects",
"dabestr",
"vegan",
"DESeq2",
"caret",
'party',
'randomForest',
'rfPermute', '
rpart.plot',
'wesanderson')
set.seed(673)
ggplot2::theme_set(ggplot2::theme_classic())
Let’s read in the clinical datasets
#read in data about pids and cohort
pids <- read.csv(file="~/Dropbox/oral-clinical-data-analysis-data/cohort_20191019.csv") #152 row
#read in age
age.df = read.csv("~/Dropbox/oral-clinical-data-analysis-data/age_df.csv") #152 rows
age.df$X = NULL
age.df= plyr::join(age.df, pids)
#read in flow rates
uwsfr = read.csv("~/Dropbox/oral-clinical-data-analysis-data/uwsfr_df.csv") #152 rows
uwsfr$X = NULL
uwsfr= plyr::join(uwsfr, pids)
#read the coordinates
map <- read.csv("~/Dropbox/oral-clinical-data-analysis-data/mytoothdot_coordinates_FullMouth.csv")
map <- na.omit(map)
colnames(map)[4] <- "tooth"
map$X.SampleID = NULL
#read in the caries data
caries.data <- read.csv(file="~/Dropbox/oral-clinical-data-analysis-data/caries_data_v3.0.csv")
caries.df = plyr::join(caries.data, uwsfr)
caries.df = plyr::join(caries.df, age.df)
caries_map = plyr::join(caries.df, map, by="tooth")
#read in the perio data and create a uwsfr subset
perio.data <- read.csv(file="~/Dropbox/oral-clinical-data-analysis-data/periodontal_data_v2.0.csv")
perio.df = plyr::join(perio.data, uwsfr)
perio.df = plyr::join(perio.df, age.df)
perio_map = plyr::join(perio.df, map, by="tooth")
#read in an image of the mouth
#create color so that 0 is midpoint and set as white
#load and setup background mouth diagram
img <- readPNG("~/Dropbox/mouth_smaller.png")
g<- grid::rasterGrob(img, interpolate=TRUE, height = unit(.95,"npc"), width = unit(.9, "npc"))
#transformations
caries_map$DMFS_orderNorm =orderNorm(caries_map$DMFS.bytooth)$x.t
perio_map$cal_orderNorm = orderNorm(perio_map$subject.tooth.mean.cal)$x.t #normal
perio_map$pd_orderNorm = orderNorm(perio_map$subject.tooth.mean.pd)$x.t #normal
perio_map$gmcej_orderNorm =orderNorm(perio_map$subject.tooth.mean.gm.cej)$x.t #normal
perio_map$bop_orderNorm =orderNorm(perio_map$bop)$x.t
Figure 1: Gardner-Altman estimation plots reveal significant demographic and clinical differences between low flow and control cohorts. Data for control and low flow cohort patient Swarm plots of a) age b) UWS-FR, c) DMFS, d) CAL e) GM-CEJ, f) BOP and g) PD are plotted on the left-aligned axis. The right-aligned axis reveals the point estimate of the unpaired mean difference between groups, surrounded by their 95% confidence intervals (95% CI).
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Figure1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#define a function to summarize the clinical data
summarize_clinical_feature <- function(df, vars) {
mydf <- df %>%
ungroup(.) %>%
dplyr::select(vars) %>%
unique(.) %>%
na.omit(.)
return(mydf)
}
#### Plot uws-fr
uwsfr$complete = complete.cases(uwsfr$uwsfr)
uwsfr$`UWS-FR (mL/min)` = uwsfr$uwsfr
uwsfr = subset(uwsfr, complete==TRUE)
uws.fr.plot <-
uwsfr %>%
dabest(cohort, `UWS-FR (mL/min)`,
idx = c("Control", "Low Flow"),
paired = FALSE)%>%
mean_diff()
Fig1A = plot(uws.fr.plot, axes.title.fontsize = 12)
#get the mean uws-fr
#detach(package:plyr)
#out = uwsfr %>%
# group_by(cohort) %>%
# summarize(mean_uws = mean(uwsfr, na.rm = TRUE))
#out
#plot age
age.df$`Age (Years)` = age.df$age
age.df = subset(age.df, redcap !=268) # drop 268 since we don't have clinical data for this subject
age.plot <-
age.df %>%
dabest(cohort, `Age (Years)` ,
idx = c("Control", "Low Flow"),
paired = FALSE)%>%
mean_diff()
Fig1B = plot(age.plot, axes.title.fontsize = 12)
#get the mean ages by cohort
#out = age.df %>%
# group_by(cohort) %>%
# summarize(mean_age = mean(age, na.rm = TRUE))
#out
age.sub = unique(age.df$redcap)
#get the proportion of controls who were over the age of 40 (N=23; 19.33%)
over40c = subset(age.df, age > 40 & cohort=="Control")
c = subset(age.df, cohort=="Control")
nrow(over40c)/nrow(c)
## [1] 0.1932773
#plot dmfs
caries_map <- filter(caries_map, tooth != 1 & tooth != 16 & tooth != 17 & tooth != 32)
perio_map <- filter(perio_map, tooth != 1 & tooth != 16 & tooth != 17 & tooth != 32)
#This subfigure compares overall DMFS score per subject between cohorts
dmfs.df <- summarize_clinical_feature(df=caries_map, vars=c("redcap", "aim", "DMFS_orderNorm", "cohort"))
dmfs.df2 <- summarize_clinical_feature(df=caries_map, vars=c("redcap", "aim", "DMFS.bysubject", "cohort"))
dmfs.df2$DMFS = dmfs.df2$DMFS.bysubject
dmfs.plot <-
dmfs.df2 %>%
dabest(cohort, DMFS,
idx = c("Control", "Low Flow"),
paired = FALSE)%>%
mean_diff()
dmfs.sub = unique(dmfs.df2$redcap)
Fig1C = plot(dmfs.plot, axes.title.fontsize = 12)
#get the average dmfs by cohort
#out = dmfs.df2 %>%
# group_by(cohort) %>%
# summarize(mean_dmfs = mean(DMFS.bysubject, na.rm = TRUE))
#out
#it looks like there's a subset with low DMFS, the subset has fewer than 15; when we subset on these, they have a lower average age than the individuals in the upper group
dfms.low = subset(dmfs.df2, DMFS.bysubject < 15)
dfms.low.subjects = dfms.low$redcap
ages_with_low_dmfs = subset(age.df, redcap %in% dfms.low.subjects)
ages_with_higher_dmfs = subset(age.df, !(redcap %in% dfms.low.subjects))
#plot cal
cal.df = summarize_clinical_feature(perio_map, vars=c("redcap", "aim", "mean.cal", "cohort"))
cal.df$`Mean CAL` =cal.df$mean.cal
cal.plot <-
cal.df %>%
dabest(cohort, `Mean CAL`,
idx = c("Control", "Low Flow"),
paired = FALSE)%>%
mean_diff()
Fig1D = plot(cal.plot, axes.title.fontsize = 12)
#CAL by group
#out = cal.df %>%
# group_by(cohort) %>%
# summarize(mean_cal = mean(mean.cal, na.rm = TRUE))
#out
#plot PD
pd.df <- summarize_clinical_feature(df=perio_map, vars=c("redcap", "aim", "subject.mean.pd", "cohort"))
pd.df$`Mean PD` = pd.df$subject.mean.pd
pd.plot <-
pd.df %>%
dabest(cohort, `Mean PD`,
idx = c("Control", "Low Flow"),
paired = FALSE)%>%
mean_diff()
Fig1E = plot(pd.plot, axes.title.fontsize = 12)
#plot gmcej between cohorts.
gmcej.df <- summarize_clinical_feature(df=perio_map, vars=c("redcap", "aim", "subject.mean.gm.cej", "cohort"))
gmcej.df$`Mean GM-CEJ` = gmcej.df$subject.mean.gm.cej
gmcej.plot <-
gmcej.df %>%
dabest(cohort, `Mean GM-CEJ`,
idx = c("Control", "Low Flow"),
paired = FALSE)%>%
mean_diff()
Fig1F = plot(gmcej.plot, axes.title.fontsize = 12)
#CAL by group
#out = gmcej.df %>%
# group_by(cohort) %>%
# summarize(mean_gm = mean(subject.mean.gm.cej, na.rm = TRUE))
#out
#plot bop
bop.df <- summarize_clinical_feature(df=perio_map, vars=c("redcap", "aim", "percent.bop", "cohort"))
bop.df$`Percent BOP` = bop.df$percent.bop
bop.plot <-
bop.df %>%
dabest(cohort, `Percent BOP`,
idx = c("Control", "Low Flow"),
paired = FALSE)%>%
mean_diff()
Fig1G = plot(bop.plot, axes.title.fontsize = 12)
#bop by group
#out = bop.df %>%
# group_by(cohort) %>%
# summarize(mean_bop = mean(percent.bop, na.rm = TRUE))
#out
##### Make Figure 1
Figure1 <- cowplot::plot_grid(
Fig1A + theme(legend.position = "none", axis.text=element_text(size=12)),
Fig1B + theme(legend.position = "none"),
Fig1C + theme(legend.position = "none"),
Fig1D + theme(legend.position = "none"),
Fig1F + theme(legend.position = "none"),
Fig1G + theme(legend.position = "none"),
Fig1E + theme(legend.position = "none"),
labels = "auto", ncol = 2)
Figure1
ggsave(Figure1, file="~/Dropbox/oral-clinical-data-analysis-ms/Figure1.png", width=10, height=12, units="in", dpi=300)
How many “control” subjects have flow rates less than 0.3?
#how many controls have flow rates less than 0.3
control_N0.3 = subset(uwsfr, aim %in% c(1, 2)) %>%
subset(., uwsfr <= 0.3) #35
#how many have flow rates less than 0.2
control_N0.2 = subset(uwsfr, aim %in% c(1, 2)) %>%
subset(., uwsfr <= 0.2) #18
control_N0.1 = subset(uwsfr, aim %in% c(1, 2)) %>%
subset(., uwsfr <= 0.1) #3
Note that these are drawn from the final version of the consent forms.
Sjogren’s Syndrome (SS) subjects were included in the study if they were adults over 18 years old, complained of dry mouth, and had been diagnosed at least three months ago with Sjogren’s Syndrome as per the American European Consensus Criteria37 or the American College of Rheumatology Classification Criteria.38 The diagnosis was confirmed with doctor’s note at a later appointment. The diagnosis date of SS was also collected verbally and verified with subjects’ documentation. Otherwise healthy control subjects were included in the study if they were healthy, non-smoking adults over the age of 18.
Exclusion criteria for the SS group were: 1) having fewer than 15 non-implant teeth, 2) smoke or use chewing tobacco or snuff or quit using tobacco products within the 6 months preceding enrollment, 3) being treated by a physician for an uncontrolled chronic medical condition, 4) have symptoms of or treatment of asthma or acid reflux in the last 3 months, 5) history of radiation therapy to the head or neck, 6) history of oral, systemic antibiotics or antifungals use within the 6 month period preceding enrollment, 7) required to take antibiotics before dental treatment, 8) history of stimulant or heroin abuse or of eating disorders, 9) lactating, pregnant, or intending to become pregnant, 10) any dental treatment during the one month period preceding enrollment and cannot or will not abstain from dental treatments during their enrollment, 11) have fixed dental appliance (retainers, fixed dentures, braces, orthodontic wires), 12) periodontitis, candidiasis, halitosis, tooth pain, or any other disease in the mouth (to patient’s knowledge), 13) diagnosed with Sjogren’s syndrome by the American European Consensus Criteria [37] or the American College of Rheumatology Classification Criteria38 fewer than 3 months prior to the date of enrollment.
Exclusion criteria for the control group were: 1) having fewer than 15 natural, non-implant teeth. 2) missing both central incisors, both canines, or both first molars in either the maxilla or the mandible, 3) crown or implant replacing both central incisors, both canines, or both first molars in either the maxilla or the mandible, 4) currently being treated by a physician for any chronic medical condition, including asthma and acid reflux, 5) history of radiation therapy to the head or neck, 6) take any medication on a daily basis other than birth control, 7) history of oral, systemic antibiotics or antifungals use within the 6 month period preceding enrolment, 8) required to take antibiotics before dental treatment, 9) history of stimulant or heroin abuse or of eating disorders 10) lactating, pregnant, or intending to become pregnant, 11) any dental treatment during the 1 month period preceding enrollment and cannot or will not abstain from dental treatments during their enrollment, 12) fixed dental appliance (retainers, fixed dentures, braces, orthodontic wires), 13) experienced dry mouth for a full week at any time in the past 6 months, 14) periodontitis, candidiasis, halitosis, tooth pain, or any other disease in the mouth (to patient’s knowledge).
Chi-square tests indicate that the distribution of patients in each cohort didn’t differ with respect to gender, race, or ethnicity.
We have demographic data for 33 people, but actual data for only 32.
#read in the race data and perform chi-square tets
race.df = read.csv("~/Dropbox/oral-clinical-data-analysis-data/race_df.csv")[1:7,1:3]
chisq.test(race.df$Controls, race.df$Low.Flow)
##
## Pearson's Chi-squared test
##
## data: race.df$Controls and race.df$Low.Flow
## X-squared = 28, df = 24, p-value = 0.26
#read in ethnicity data and do chi-sq tests
ethnic.df <- read.csv("~/Dropbox/oral-clinical-data-analysis-data/ethnicity_df.csv")[1:3,1:3]
chisq.test(ethnic.df$Control, ethnic.df$Low.Flow)
##
## Pearson's Chi-squared test
##
## data: ethnic.df$Control and ethnic.df$Low.Flow
## X-squared = 6, df = 4, p-value = 0.1991
#read in gender and do chi-square tetst
gender.df <- read.csv("~/Dropbox/oral-clinical-data-analysis-data/gender_df.csv")[1:5,1:3]
chisq.test(gender.df$Control, gender.df$Low.Flow)
##
## Pearson's Chi-squared test
##
## data: gender.df$Control and gender.df$Low.Flow
## X-squared = 7.5, df = 6, p-value = 0.2771
Let’s define some functions to generate the linear models
#set up the vector for teeth
tooth.model = c(rep(2:15, each=5), rep(18:31, each=5))
#Add tooth class to dataset.
define_tooth_class <- function(df){
df$tooth.class <- ifelse(df$tooth %in% c(1:3, 14:19,30:32), "Molar",
ifelse(df$tooth %in% c(4:5, 12:13, 20:21, 28:29), "Pre.Molar",
ifelse(df$tooth %in% c(6, 11, 22, 27), "Canine",
ifelse(df$tooth %in% c(7, 10, 23, 26), "Lateral.Incisor",
ifelse(df$tooth %in% c(8, 9, 24, 25), "Central.Incisor", "error")))))
return(df)
}
#This function gets the per-tooth regression coefficients and 95% confidence intervals
# we also get the pvalues
get_spltmap_out <- function(df, myvector, model){
conf <- df %>%
split(., myvector) %>%
map(~ lm(model, data = .)) %>%
map_df(~ tidy(., conf.int = TRUE, conf.level = 0.95))
df = data.frame(conf)
df$tooth <- tooth.model
df$term <- gsub("cohortSjögren's", "Cohort", df$term)
df$term <- gsub("age", "Age", df$term)
df$term <- gsub("cohortLow Flow", "Cohort", df$term)
df$term <- gsub("cohort:uwsfr", "Cohort:UWS-FR", df$term)
df$term <- gsub("uwsfr", "UWS-FR", df$term)
df = subset(df, term !="(Intercept)")
return(df)
}
### set up tthe plot for age while specifying the model
# ylim(-0.35, 0.7)
plot_age <- function(df, model){
p <- ggplot(filter(df, term == "Age"), aes(x = tooth, y = estimate, color = tooth.class, shape=Significance)) +
geom_point(size=3) +
geom_hline(yintercept = 0, color = "gray") +
geom_vline(xintercept = 16.5, color = "gray", linetype="dotted") +
geom_errorbar(aes(ymin = conf.low , ymax = conf.high)) +
coord_flip() +
ggtitle(paste0(model, " ", "~ Age")) +
labs(x = "Universal Tooth Number", y = "Estimate", color = "Tooth Class")
}
#set up plot for term cohort and model
#+ylim(-1.6, 1.7)
plot_cohort <- function(df, model){
p <- ggplot(filter(df, term == "cohort"),
aes(x = tooth, y = estimate, color = tooth.class, shape=Significance)) +
geom_point(size=3) +
geom_hline(yintercept = 0, color = "gray") +
geom_vline(xintercept = 16.5, color = "gray", linetype="dotted") +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
coord_flip() +
ggtitle(paste0(model, " ", "~ Cohort")) +
labs(x = "Universal Tooth Number", y = "Estimate", color = "Tooth Class")
}
#plot the term uwsfr and the model
#+ylim(-1.1, 1)
plot_flow <- function(df, model) {
p <- ggplot(filter(df, term == "UWS-FR"),
aes(x = tooth, y = estimate, color = tooth.class, shape=Significance)) +
geom_point(size=3) +
geom_hline(yintercept = 0, color = "gray") +
geom_vline(xintercept = 16.5, color = "gray", linetype="dotted") +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
coord_flip() +
ggtitle(paste0(model, " ", "~ UWS-FR")) +
labs(x = "Universal Tooth Number", y = "Estimate", color = "Tooth Class")
}
#plot the interaction
#ylim(-4, 4)
plot_interaction <- function(df, model) {
p <- ggplot(filter(df, term == "Cohort:UWS-FR"),
aes(x = tooth, y = estimate, color = tooth.class, shape=Significance)) +
geom_point(size=3) +
geom_hline(yintercept = 0, color = "gray") +
geom_vline(xintercept = 16.5, color = "gray", linetype="dotted") +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
coord_flip() +
ggtitle(paste0(model, " ", "~ Cohort:UWS-FR")) +
labs(x = "Universal Tooth Number", y = "Estimate", color = "Tooth Class")
}
Look at DMFS
#we want to scale the data so that predictors can be compared
tooth.model = tooth.model[!(tooth.model %in% c(1, 16, 17, 32))]
caries.df <- dplyr::select(caries_map,
c("DMFS.bytooth", "cohort", "age", "uwsfr", "tooth")) %>%
filter(., !(tooth %in% c(1, 16, 17, 32) ))
#subset on the variables of interest
keep = dplyr::select(caries.df, c("DMFS.bytooth", "tooth"))
caries.df$cohort = plyr:::revalue(caries.df$cohort, c("Control"=0, "Low Flow"=1))
caries.df$cohort = as.numeric(as.character(caries.df$cohort))
#center and scale
caries.df.scaled = data.frame(scale(caries.df[,2:4], center=TRUE, scale=TRUE))
caries.df.scaled = data.frame(cbind(keep), caries.df.scaled)
caries.df.scaled$dmfs_orderNorm = orderNorm(caries.df.scaled$DMFS.bytooth)$x.t #normal
#run the dmfs model on the normalized d
dmfs.model = 'dmfs_orderNorm ~ cohort + uwsfr + cohort:uwsfr + age'
df = get_spltmap_out(df=caries.df.scaled,
myvector=as.vector(caries.df.scaled$tooth),
model=dmfs.model)
df = define_tooth_class(df)
#adjust the p-values for multiple testing
df$p.adj = p.adjust(df$p.value, method="BH", n=length(df$p.value))
df$Significance = ifelse(df$p.adj < 0.05, "Yes", "No")
#combine coefs and confs and filter out intercept
#add factor to facet by
df$terms_f <- factor(df$term, levels=c("Age", "Cohort", "UWS-FR", "Cohort:UWS-FR"))
#Plot results for the DMFS Model
Fig2A = plot_age(df, model="DMFS")
#age.sig = subset(df, p.adj < 0.05 & term=="Age") # age impacts all teeth
#age.sig$Significant = ifelse(age.sig$p.value <0.05, "Yes","No")
Fig2B = plot_cohort(df, model="DMFS")
#dfms.sig = subset(df, p.adj < 0.05 & term=="cohort")
#on average difference between anterior to posterior
#ant = subset(df, tooth.class %in%
# c("Canine", "Central.Incisor", "Lateral.Incisor" ) & term=="cohort" )
#post = subset(df, tooth.class %in% c("Molar", "Pre.Molar" ) & term=="cohort" )
#mean(ant$estimate)/mean(post$estimate)
#signeg = subset(dfms.sig, estimate < 0)
#sigPos = subset(dfms.sig, p.adj < 0.05 & estimate >-0.04 )
Fig2C = plot_flow(df, model="DMFS")
#uws.sig = subset(df,p.adj < 0.05 & term=="UWS-FR")
Fig2D = plot_interaction(df, model="DMFS")
#int.sig = subset(df, p.adj < 0.05 & term=="Cohort:UWS-FR") # 11 teeth
Supplementary Figure 1: Scatterplot of DMFS as a function of UWS-FR. Linear models were fit for each cohort. As flow rate decreases the number of DMFS increases at a greater rate in the low flow cohort (blue) compared to controls (orange).
SuppFig1 = subset(caries_map, cohort %in% c("Control", "Low Flow")) %>%
ggplot(., aes(uwsfr , DMFS.bysubject , color=as.factor(cohort)),
group=as.factor(cohort)) +
geom_point() + geom_smooth(method = lm)
ggsave(SuppFig1, file="~/Dropbox/oral-clinical-data-analysis-ms/SupplementalFigure1.png", width=10, height=12, units="in", dpi=300)
Set up periodontal data frame
plot_cohort <- function(df, model){
p <- ggplot(filter(df, term == "Cohort"), aes(x = tooth, y = estimate, color = tooth.class, shape=Significance)) +
geom_point(size=3) +
geom_hline(yintercept = 0, color = "gray") +
geom_vline(xintercept = 16.5, color = "gray", linetype="dotted") +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
coord_flip() +
ggtitle(paste0(model, " ", "~ Cohort")) +
labs(x = "Universal Tooth Number", y = "Estimate", color = "Tooth Class")
}
perio.df <- dplyr::select(perio_map, c("cal", "pd","gm.cej","bop",
"cohort", "age", "uwsfr", "tooth")) %>%
filter(., tooth != 1 & tooth != 16 & tooth != 17 & tooth != 32)
#transformations
keep = dplyr::select(perio.df, c("cal",
"pd",
"gm.cej",
"bop",
"cohort",
"tooth"))
perio.df$cohort = plyr:::revalue(perio.df$cohort, c("Control"=0, "Low Flow"=1))
perio.df$cohort = as.numeric(as.character(perio.df$cohort))
perio.df.scaled = data.frame(scale(perio.df[,5:7], center=TRUE, scale=TRUE)) #scale the predictors
perio.df.scaled = data.frame(cbind(keep), perio.df.scaled)
CAL model
perio.df.scaled$cal_orderNorm = orderNorm(perio.df.scaled$cal)$x.t
cal.model = 'cal_orderNorm ~ cohort + uwsfr + cohort:uwsfr + age'
mycoefficients = get_spltmap_out(df=perio.df.scaled,
myvector=as.vector(perio.df.scaled$tooth),
model=cal.model)
df = define_tooth_class(mycoefficients)
#adjust the p-values for multiple testing
df$p.adj = p.adjust(df$p.value, method="BH", n=length(df$p.value))
df$Significance = ifelse(df$p.adj < 0.05, "Yes", "No")
#add factor to facet by
df$terms_f <- factor(df$term, levels=c("Age", "cohort", "UWS-FR", "Cohort:UWS-FR"))
#Plot results for the CAL Model
Fig2E = plot_age(df, model="CAL")
#age.sig = subset(df, p.adj < 0.05 & term=="Age")
Fig2F = plot_cohort(df, model="CAL")
#cal.sig = subset(df, p.adj < 0.05 & term=="Cohort")
#antMax = subset(cal.sig, tooth %in% c(6, 7, 8, 9, 10, 11))
#min(antMax$estimate)
#max(antMax$estimate)
#mean(antMax$estimate)
#antMan = subset(cal.sig, tooth %in% c(22, 23, 24, 25, 26, 27))
#min(antMan$estimate)
#max(antMan$estimate)
#mean(antMan$estimate)
#postMax = subset(cal.sig, tooth %in% c(3, 4, 5, 12, 13, 14))
#min(postMax$estimate)
#max(postMax$estimate)
#mean(postMax$estimate)
#postMan = subset(cal.sig, tooth %in% c(20, 21, 30))
#min(postMan$estimate)
#max(postMan$estimate)
#mean(postMan$estimate)
Fig2G = plot_flow(df, model="CAL")
#uws.sig = subset(df, p.adj < 0.05 & term=="UWS-FR")
Fig2H = plot_interaction(df, model="CAL")
#int.sig = subset(df, p.adj < 0.05 & term=="Cohort:UWS-FR")
probing depth
perio.df.scaled$pd_orderNorm = orderNorm(perio.df.scaled$pd)$x.t
pd.model = 'pd_orderNorm ~ cohort + uwsfr + cohort:uwsfr + age'
mycoefficients = get_spltmap_out(df=perio.df.scaled,
myvector=as.vector(perio.df.scaled$tooth),
model=pd.model)
df = define_tooth_class(mycoefficients)
#adjust the p-values for multiple testing
df$p.adj = p.adjust(df$p.value, method="BH", n=length(df$p.value))
df$Significance = ifelse(df$p.adj < 0.05, "Yes", "No")
#add factor to facet by
df$terms_f <- factor(df$term, levels=c("Age", "Cohort", "UWS-FR", "Cohort:UWS-FR"))
#Plot results for the PD Model
Fig2I = plot_age(df, model="PD")
#pd.sig = subset(df, p.adj < 0.05 & term=="Age")
Fig2J = plot_cohort(df, model="PD")
#pd.sig = subset(df, p.adj < 0.05 & term=="Cohort")
Fig2K = plot_flow(df, model="PD")
#pd.sig = subset(df, p.adj < 0.05 & term=="UWS-FR")
Fig2L = plot_interaction(df, model="PD")
gm-cej
perio.df.scaled$gm.cej_orderNorm = orderNorm(perio.df.scaled$gm.cej)$x.t
gm.model = 'gm.cej_orderNorm ~ cohort + uwsfr + cohort:uwsfr + age'
mycoefficients = get_spltmap_out(df=perio.df.scaled,
myvector=as.vector(perio.df.scaled$tooth),
model=gm.model)
df = define_tooth_class(mycoefficients)
#adjust the p-values for multiple testing
df$p.adj = p.adjust(df$p.value, method="BH", n=length(df$p.value))
df$Significance = ifelse(df$p.adj < 0.05, "Yes", "No")
#add factor to facet by
df$terms_f <- factor(df$term, levels=c("Age", "Cohort", "UWS-FR", "Cohort:UWS-FR"))
#Plot results for the GM-CEJ Model
Fig2M = plot_age(df, model="GM-CEJ")
Fig2N = plot_cohort(df, model="GM-CEJ")
#gm.sig = subset(df, p.adj < 0.05 & term=="Cohort")
#antMax = subset(gm.sig, tooth %in% c(7, 10, 12))
#min(antMax$estimate)
#max(antMax$estimate)
#mean(antMax$estimate)
#antMan = subset(gm.sig, tooth %in% c(22, 23, 24, 25, 26, 27))
#min(antMan$estimate)
#max(antMan$estimate)
#mean(antMan$estimate)
#postMax = subset(gm.sig, tooth %in% c(3, 4, 5, 12, 13, 14))
#min(postMax$estimate)
#max(postMax$estimate)
#mean(postMax$estimate)
#postMan = subset(gm.sig, tooth %in% c(20, 21, 30))
#min(postMan$estimate)
#max(postMan$estimate)
#mean(postMan$estimate)
Fig2O = plot_flow(df, model="GM-CEJ")
#gm.sig = subset(df, p.adj < 0.05 & term=="UWS-FR")
Fig2P = plot_interaction(df, model="GM-CEJ")
#int.sig = subset(df, p.adj < 0.05 & term=="Cohort:UWS-FR")
bop
perio.df.scaled$bop_orderNorm =orderNorm(perio.df.scaled$bop)$x.t
bop.model = 'bop_orderNorm ~ cohort + uwsfr + cohort:uwsfr + age'
mycoefficients = get_spltmap_out(df=perio.df.scaled,
myvector=as.vector(perio.df.scaled$tooth),
model=bop.model)
df = define_tooth_class(mycoefficients)
#adjust the p-values for multiple testing
df$p.adj = p.adjust(df$p.value, method="BH", n=length(df$p.value))
df$Significance = ifelse(df$p.adj < 0.05, "Yes", "No")
#add factor to facet by
df$terms_f <- factor(df$term, levels=c("Age", "Cohort", "UWS-FR", "Cohort:UWS-FR"))
#Plot results for the BOP Model
Fig2Q = plot_age(df, model="BOP")
#age=subset(df, term=="Age" & Significance=="Yes")
Fig2R = plot_cohort(df, model="BOP")
#bop.sig = subset(df, p.adj <= 0.05 & term=="Cohort")
Fig2S = plot_flow(df, model="BOP")
#bop.sig = subset(df, p.adj < 0.05 & term=="UWS-FR")
Fig2T = plot_interaction(df, model="BOP")
#bop.sig = subset(df, p.adj < 0.05 & term=="Cohort:UWS-FR")
Figure 2: Explicit spatial modeling reveals independent effects of age and patient cohort on the spatial pattern of dental disease. Coefficients with uncertainty intervals are plotted for a series of per-tooth linear models where DMFS (a), CAL (b), and GM-CEJ (c) were regressed against age, patient cohort, UWS-FR and the interaction between UWS-FR and cohort.
#~~~~~~~~~~~~~~~~~~~~
fig2legend <- cowplot::get_legend(Fig2A)
Fig2 <- cowplot::plot_grid((Fig2A + theme(legend.position = "none")), #DMFS~Age
(Fig2E + theme(legend.position = "none")), #CAL-age
(Fig2M + theme(legend.position = "none")), #GM-CEJ
(Fig2Q + theme(legend.position = "none")), #BOP
(Fig2I + theme(legend.position = "none")),
(Fig2B + theme(legend.position = "none")), #DMFS~Cohort
(Fig2F + theme(legend.position = "none")), #CAL-cohort
(Fig2N + theme(legend.position = "none")), #GM-CEJ
(Fig2R + theme(legend.position = "none")), #BOP
(Fig2J + theme(legend.position = "none")),
(Fig2C + theme(legend.position = "none")), #DMFS~UWS-Fr
(Fig2G + theme(legend.position = "none")),#CAL
(Fig2O + theme(legend.position = "none")), #GM-CEJ
(Fig2S + theme(legend.position = "none")), #BOP
(Fig2K + theme(legend.position = "none")),
(Fig2D + theme(legend.position = "none")), #DMFS~interaction
(Fig2H + theme(legend.position = "none")), #CAL
(Fig2P + theme(legend.position = "none")), #GM-CEJ
(Fig2T + theme(legend.position = "none")), #BOP
(Fig2L + theme(legend.position = "none")),
ncol = 5, labels = c("a", "b", "c", "d",
"e", "f", "g", "h",
"i", "j", "k", "l",
"m", "n", "o", "p",
"q", "r", "s", "t"), hjust = -2.75, vjust = 1.5)
Figure2 <- cowplot::plot_grid(Fig2, fig2legend, nrow = 1, rel_widths = c(1,.3))
ggsave(Figure2, file="~/Dropbox/oral-clinical-data-analysis-ms/Figure2.png", width = 17, height = 10, units ="in",dpi = 300, device = "png")
Figure2
Figure 3: Subgingival and supragingival communities from 3 control and 3 low flow subjects segregate by UWS-FR while age imperfectly separates supragingival communities. Principal coordinates analysis on bray Curtis dissimilarity segregated communities by UWS-FR (a) along the second coordinate which explained 12.8% of the variance. Samples clustered by age (b) for subgingival, but not supragingival sites.
library(wesanderson)
library(ggsci)
add_plot_elements <- function(x) {
theme_update() +
theme(plot.title = element_text(color="black", size=12),
axis.title.x = element_text(color="black", size=12),
axis.title.y = element_text(color="black", size=12),
text = element_text(size=12),
axis.text.x = element_text(angle=0, hjust=1),
panel.spacing = unit(0, "lines"),
strip.background = element_blank(),
strip.placement = "outside")
}
pal <- wes_palette("Zissou1", 10, type = "continuous")
pal2 = wes_palette("Royal1", 5, "continuous")
buda.pal <-wes_palette("Cavalcanti1", 10, "continuous")
#read in the microbial data and create a unified mapping file that includes the clinical data
clin2microbe = read.csv("~/Dropbox/oral-clinical-data-analysis-data/clin2microbemap.csv")
illumina = readRDS("~/Dropbox/oral-clinical-data-analysis-data/periodontology2000_hypo.RDS")
illumina = subset_samples(illumina, Protocol=="Clinic")
old.map = data.frame(sample_data(illumina))
new_map =merge(old.map, clin2microbe)
new_map$Duplication = duplicated(new_map$X.SampleID)
new_map = subset(new_map, Duplication !="TRUE")
rownames(new_map) = new_map$X.SampleID
sample_data(illumina) = sample_data(new_map)
sample_data(illumina)$DMFS = sample_data(illumina)$DMFS.bytooth
sample_data(illumina)$Habitat_Class = ifelse(sample_data(illumina)$Habitat_Class=="Supra",
"Supragingival", "Subgingival")
# use phyloseq to ordinate the phyloseq object
ord = ordinate(illumina, method="PCoA", distance="bray")
sample_data(illumina)$Cohort = ifelse(sample_data(illumina)$Aim=="SA1", "Control", "Low Flow")
Figure3a = plot_ordination(illumina, ord, type="samples", color="uwsfr", shape="Cohort") +
scale_color_gradientn(colours = pal) +
facet_wrap(~Habitat_Class, scales="free") + theme_classic() + add_plot_elements()
Figure3b = plot_ordination(illumina, ord, type="samples", color="age", shape="Cohort") +
scale_color_viridis_c() +
facet_wrap(~Habitat_Class, scales="free")+ theme_classic() + add_plot_elements()
sample_data(illumina)$DMFS.bytooth = as.factor(sample_data(illumina)$DMFS.bytooth)
Figure3c = plot_ordination(illumina, ord, type="samples",
color="DMFS.bytooth", shape="Cohort") +
scale_color_manual(values=buda.pal) +
facet_wrap(~Habitat_Class, scales="free") +
theme_classic() + geom_point()+ add_plot_elements()
Figure3 <- cowplot::plot_grid((Figure3a),
(Figure3b),
(Figure3c),
ncol = 1, labels = c("a", "b", "c"), hjust = -2.75, vjust = 1.5)
Figure3
ggsave(Figure3, file="~/Dropbox/oral-clinical-data-analysis-ms/Figure3.png", width = 8, height = 10, units ="in",dpi = 300, device = "png")
Supplementary Figure 2. Patterns observed in Figure 3 can be seen using a variety of different ecological distance metrics.
library(vegan)
###do relative ab?undance transformation and see if the patterns are the same
illuminaRA = transform_sample_counts(illumina, function(x) x / sum(x) )
#replace the otu table with hellinger transformed data
otus = data.frame(otu_table(illuminaRA))
outs.h = decostand(otus, "hellinger")
otu_table(illuminaRA) = otu_table(outs.h, taxa_are_rows=FALSE)
#make a list of nmds objects
dist_methods_deco = c("binomial", "bray", "canberra", "euclidean", "gower", "jaccard", "kulczynski", "manhattan")
nmds_ideco.list <- vector("list", length(dist_methods_deco))
plot_ideco.list <- vector("list", length(dist_methods_deco))
for(i in 1:length(dist_methods_deco)){
iDist <- vegan::vegdist(outs.h, method=dist_methods_deco[i])
iMDS <- ordinate(illuminaRA, method="PCoA", distance=iDist, correction="lingoes")
p <- plot_ordination(illuminaRA, iMDS, color="Tooth_Class") +
facet_wrap(Jaw_Quadrant~Tooth_Aspect)
ps <- p$data
ps$Distance <- dist_methods_deco[[i]]
plot_ideco.list[[i]] <- ps
nmds_ideco.list[[i]] <- iMDS
}
plot_ideco <- do.call("rbind", plot_ideco.list)
FigureS2a=ggplot(plot_ideco, aes(Axis.1, Axis.2, color=uwsfr)) +
facet_wrap(Distance~Habitat_Class, scales="free", ncol=2) + geom_point() + theme_bw()+
scale_color_gradientn(colours = pal) + ggtitle("UWSFR")
FigureS2b=ggplot(plot_ideco, aes(Axis.1, Axis.2, color=age)) +
facet_wrap(Distance~Habitat_Class, scales="free", ncol=2) + geom_point() + theme_bw()+
ggtitle("Age") + scale_color_viridis_c()
FigureS2c=ggplot(plot_ideco, aes(Axis.1, Axis.2, color=DMFS.bytooth)) +
facet_wrap(Distance~Habitat_Class, scales="free", ncol=2) + geom_point() + theme_bw()+
ggtitle("DMFS")+scale_color_manual(values=buda.pal)
ggsave(grid.arrange(FigureS2a, FigureS2b, FigureS2c, ncol=3), file="~/Dropbox/oral-clinical-data-analysis-ms/FigureS2.png", width = 12, height = 14, units ="in",dpi = 300)
Supplementary Figure 4: Constrained correspondence analysis of microbial data using clinical variates as predictors. canonical correlation show age, pd, and dmfs are correlated
library(ggvegan)
buda.pal <-wes_palette("Cavalcanti1", 10, "continuous")
#set an analysis up for CCA
phyCLR= transform_sample_counts(illumina, function(x) x/sum(x))
#phyCLR = transform_sample_counts(illumina, function(x) compositions::clr(x))
map1 = data.frame(sample_data(phyCLR)$Tooth_Class, sample_data(phyCLR)$uwsfr,
sample_data(phyCLR)$DMFS.bytooth,
sample_data(phyCLR)$subject.tooth.mean.pd, sample_data(phyCLR)$subject.tooth.mean.cal,
sample_data(phyCLR)$subject.tooth.mean.gm.cej, sample_data(phyCLR)$age )
Z=data.frame(sample_data(phyCLR)$Subject)
colnames(map1) = c("Class", "uws_fr", "dmfs", "pd", "cal", "gmcej", "age")
map1$Class = as.numeric(map1$Class)
subs = as.vector(sample_data(phyCLR)$Subject)
runs = as.vector(sample_data(phyCLR)$Pool_Name)
map1$uws_fr = abs(map1$uws_fr)
otus = as.matrix(otu_table(phyCLR))
#for some reason this wworks on tsp = tax_glom(phy, "Genus")
attach(map)
ord = cca(otus ~ map1$uws_fr+map1$dmfs+map1$pd+map1$cal+map1$gmcej+map1$age, Z=Z, Ssitenv=map1) # add
ord.out<-scores(ord,display=c("sp","wa","lc","bp","cn"))
#Extract site data first
df_sites<-data.frame(ord.out$sites,data.frame(sample_data(phyCLR)))
#Draw sites
p<-ggplot()
p<-p+geom_point(data=df_sites,aes(x=CCA1,y=CCA2,shape=Cohort), size=2) +
theme_classic()
#Draw biplots
mupltiplier <- vegan:::ordiArrowMul(ord.out$biplot)
df_arrows<- ord.out$biplot*2
colnames(df_arrows)<-c("x","y")
df_arrows=as.data.frame(df_arrows)
foo = reshape2::colsplit(rownames(df_arrows), "([$])", c("junk", "variable"))
foo = plyr::revalue(as.factor(foo$variable), c("uws_fr"="UWS-FR",
"dmfs1"="DMFS1",
"dmfs2"="DMS2",
"pd"="PD",
"cal"="CAL",
"gmcej"="GM-CEJ"))
rownames(df_arrows) = foo
p<-p+geom_segment(data=df_arrows, aes(x = 0, y = 0, xend = x, yend = y),
arrow = arrow(length = unit(0.2, "cm")))
p<-p+geom_text(data=as.data.frame(df_arrows*1.1),aes(x, y, label = rownames(df_arrows)))
# Draw species
df_species<- as.data.frame(ord.out$species)
colnames(df_species)<-c("x","y")
df_species$ASV = rownames(df_species)
tax = data.frame(phyCLR@tax_table@.Data, ASV=rownames(phyCLR@tax_table@.Data), Abundance=taxa_sums(phyCLR))
dfTax = plyr::join(tax, df_species)
dfTax <- dfTax[order(dfTax$Abundance, decreasing=TRUE), ]
dfTax$ASV_Number = paste0("ASV", 1:nrow(dfTax))
dfTax = dfTax[1:15,]
# Either choose text or points
p<-p+geom_point(data=dfTax,aes(x,y, color=Genus), size=3)+
scale_color_manual(values=c("#FFDEE6","#03919B",
"#FE9866", "#BCEAF6", "#B4CF68",
"#0A5947","#FCCEB2","#C17A2A","#EC5064"))
Fig3A<-p+geom_text(data=dfTax,aes(x,y, label=ASV_Number))
p = ggplot()
p1<-p+geom_point(data=df_sites,aes(x=CCA1,y=CCA2,shape=Cohort, color=DMFS.bytooth), size=2) +
theme_classic()
Fig3B<-p1+geom_segment(data=df_arrows, aes(x = 0, y = 0, xend = x, yend = y),
arrow = arrow(length = unit(0.2, "cm"))) +
geom_text(data=as.data.frame(df_arrows*1.1),aes(x, y, label = rownames(df_arrows))) +
guides(colour=guide_legend(title="DMFS"))
p = ggplot()
Fig3C<-p+geom_point(data=df_sites,aes(x=CCA1,y=CCA2,shape=Cohort, color=subject.tooth.mean.cal), size=2) +
theme_classic() +
scale_color_gradientn(colours = pal) +
geom_segment(data=df_arrows, aes(x = 0, y = 0, xend = x, yend = y),
arrow = arrow(length = unit(0.2, "cm"))) +
geom_text(data=as.data.frame(df_arrows*1.1),aes(x, y, label = rownames(df_arrows))) +
guides(colour=guide_legend(title="Mean CAL"))
p = ggplot()
p1<-p+geom_point(data=df_sites,aes(x=CCA1,y=CCA2,shape=Cohort, color=Habitat_Class), size=2) +
theme_classic()
Fig3D<-p1+geom_segment(data=df_arrows, aes(x = 0, y = 0, xend = x, yend = y),
arrow = arrow(length = unit(0.2, "cm"))) +
geom_text(data=as.data.frame(df_arrows*1.1),aes(x, y, label = rownames(df_arrows)))
p1<-p+geom_point(data=df_sites,aes(x=CCA1,y=CCA2,shape=Cohort, color=Subject), size=2) +
theme_classic()
Fig3E<-p1+geom_segment(data=df_arrows, aes(x = 0, y = 0, xend = x, yend = y),
arrow = arrow(length = unit(0.2, "cm"))) +
geom_text(data=as.data.frame(df_arrows*1.1),aes(x, y, label = rownames(df_arrows))) +
guides(colour=guide_legend(title="DMFS"))
p = ggplot()
p1<-p+geom_point(data=df_sites,aes(x=CCA1,y=CCA2,shape=Cohort, color=uwsfr), size=2) +
theme_classic()
Fig3F<-p1+geom_segment(data=df_arrows, aes(x = 0, y = 0, xend = x, yend = y),
arrow = arrow(length = unit(0.2, "cm"))) +
geom_text(data=as.data.frame(df_arrows*1.1),aes(x, y, label = rownames(df_arrows))) +
scale_color_viridis()
p = ggplot()
p1<-p+geom_point(data=df_sites,aes(x=CCA1,y=CCA2,shape=Cohort, color=DMFS.bytooth ), size=2) +
theme_classic()
Fig3G<-p1+geom_segment(data=df_arrows, aes(x = 0, y = 0, xend = x, yend = y),
arrow = arrow(length = unit(0.2, "cm"))) +
geom_text(data=as.data.frame(df_arrows*1.1),aes(x, y, label = rownames(df_arrows)))
ggsave(
grid.arrange(Fig3B,
Fig3E,
Fig3C,
Fig3A, ncol=2), file="~/Dropbox/oral-clinical-data-analysis-ms/FigureS4.png", width = 14, height = 10)
Figure 4: Estimated migration rates by habitat within each individual. Rates were inferred using the Sloan Community Neutral Model. Each hollow point represents the estimated migration rate for each of our defined metacommunities for each individual, while the solid point represents their average with standard error bars. Colors correspond to the patient cohort. The code for this figure is in the file dispersal.html/dispersal.rmd.
Figure4
Supplementary Figure 5: Estimated migration rates of tooth habitats (Gst) for each tooth habitat and tooth aspect. Colors correspond to cohort (SA1 = control cohort; SA3 = low flow cohort). Boxplots with outer edges of boxes encapsulate the first and third quartiles. The horizontal line bisecting the center of each box demarcates the median. Each point represents an individual sample/community within each of our defined metacommunities.The code for this section is in the dispersal.rmd/dispersal.html files.
Supplementary Figure 5
Figure 5: Symptomatic complaints of dry mouth predict low salivary flow. a) principal component analysis of subject responses to visual analog scale revealed two groups of questions – questions that segregate subjects who complain of impacts to quality of life (dry_speak, dry_qol, dry_swallow) and questions that indicate feelings of dry mouth (dry_tongue, dry_throat, dry_palate, dry_lips). b) conditional inference tree was used to identify the variables most predictive of separating patients into groups based on low (UWS-FR < 0.1ml/min) and not low (UWS-FR > 0.1 ml/min) subgroups.
quest.df <- read.csv("~/Dropbox/oral-clinical-data-analysis-data/dry_mouth_questionnaire_20190915.csv")
#add cohort variable and select only the VAS varaibles for input to a PCoA
quest.df$cohort <- ifelse(quest.df$aim == 3, "Sjögren's", "Control")
pca_in <- quest.df %>%
dplyr::select(., phq_dry_swallow:phq_dry_palate)
quest.df$cat = ifelse(quest.df$uwsfr < 0.1, "Lower", "Higher")
## Plot the PCoA
library("FactoMineR")
res.pca <- PCA(pca_in, graph = FALSE)
Figure5A = fviz_pca_biplot(res.pca,
col.ind = quest.df$cat, palette = "jco",
addEllipses = FALSE, label = "var",
col.var = "black", repel = FALSE,
legend.title = "Flow Rate Category")
ggsave(Figure5A, file="~/Dropbox/oral-clinical-data-analysis-ms/Figure5a.png", device="png", height = 4, width = 8)
outDf = Figure5A$coordinates
Boxplots of VAS reveal significant differences in responses between groups. Each question on the Visual Analog Scale querying the symptomatic complaints of dry mouth is plotted as an individual panel. The midline of each boxplot represents the median while the outer edges encompass the first and third quartiles. Points on the graph indicate outliers. Colors correspond to patient cohort (control, low flow). Across all questions respondents in the low flow cohort rated their impairment and subjective complaints of dry mouth as more severe than the patients in the control cohort (Wilcoxon rank sum tests, p < 0.05).
# plot the raw data
vsq = melt(quest.df, id.vars = c("redcap", "aim", "uwsfr", "cohort"))
vsq$value = as.numeric(as.character(vsq$value))
vsq = subset(vsq, variable != "cat")
FigureS4 = ggplot(vsq, aes(cohort, value, fill=cohort)) +
geom_boxplot() + facet_wrap(~variable, ncol=4) + ylab("VAS Rating (0-100)") + theme_classic()+
stat_compare_means()
FigureS4
ggsave(FigureS4, file="~/Dropbox/oral-clinical-data-analysis-ms/FigureS4.png", device="png", height = 6, width = 8)
Figure 5B: Symptomatic complaints of dry mouth predict low salivary flow. Conditional inference tree was used to identify the variables that most robustly separated patients into groups based on low (UWS-FR < 0.1ml/min) and not low (UWS-FR > 0.1 ml/min) subgroups. This model accurately identified 84.6% (11/13) subjects as having salivary flow rates lower than 0.1 ml/min. Most subjects who indicated that low salivary flow negatively impacted their overall quality of life (VAS >=56) had flow rates of less than 0.1 ml/min. On the other hand, most subjects who rated a negligible impact of low flow on their quality of life as well as an absence of dryness on their lips had flow rates exceeding 0.1 ml/min.
library(rpart)
library(rpart.plot)
# CART model
latlontree = rpart(uwsfr ~ phq_dry_swallow + phq_dry_qol + phq_dry_speak + phq_dry_throat + phq_dry_tongue + phq_dry_lips + phq_dry_palate, data=quest.df, method="anova")
quest.df$cohort = as.factor(as.character(quest.df$cohort))
latlontree = rpart(cohort ~ phq_dry_swallow + uwsfr, data=quest.df)
library(party)
quest.df$cat = ifelse(quest.df$uwsfr < 0.1, "low", "not.low")
quest.df$cat = as.factor(quest.df$cat)
fit <- ctree(cat ~ phq_dry_swallow + phq_dry_qol + phq_dry_speak + phq_dry_throat + phq_dry_tongue + phq_dry_lips + phq_dry_palate + uwsfr,
data=quest.df)
### extract terminal node ID, two ways
all.equal(predict(fit, type = "node"), where(fit))
## [1] TRUE
#get classification
table(predict(fit), quest.df$cat)
##
## low not.low
## low 11 2
## not.low 6 111
#estimated class probabilities
tr <- treeresponse(fit, newdata = quest.df)
tr = do.call(rbind, tr)
prob = data.frame(tr, quest.df)
#prob
model <- train(cat ~ phq_dry_swallow + phq_dry_qol + phq_dry_speak + phq_dry_throat +
phq_dry_tongue + phq_dry_lips + phq_dry_palate + uwsfr,
data = quest.df, method='ctree', tuneLength=5,
trControl=trainControl(
method='cv', number=3, classProbs=TRUE, summaryFunction=twoClassSummary))
png("~/Dropbox/oral-clinical-data-analysis-ms/Figure5b.png", res=80, height=800, width=800)
plot(fit)
dev.off()
## quartz_off_screen
## 2
let’s use a training / test set. The topmost predictor is significant, but the others aren’t. We have limited power due to small N.
library(caret)
set.seed(3456)
trainIndex <- createDataPartition(quest.df$cat, p = .7,
list = FALSE,
times = 1)
train <- quest.df[ trainIndex,]
test <- quest.df[-trainIndex,]
library(caret)
tmodel = ctree(formula=cat ~ phq_dry_swallow + phq_dry_qol + phq_dry_speak + phq_dry_throat +
phq_dry_tongue + phq_dry_lips + phq_dry_palate + uwsfr,
data = train)
print(tmodel)
##
## Conditional inference tree with 3 terminal nodes
##
## Response: cat
## Inputs: phq_dry_swallow, phq_dry_qol, phq_dry_speak, phq_dry_throat, phq_dry_tongue, phq_dry_lips, phq_dry_palate, uwsfr
## Number of observations: 92
##
## 1) phq_dry_qol <= 50; criterion = 1, statistic = 49.092
## 2) phq_dry_lips <= 28; criterion = 0.996, statistic = 12.24
## 3)* weights = 60
## 2) phq_dry_lips > 28
## 4)* weights = 24
## 1) phq_dry_qol > 50
## 5)* weights = 8
plot(tmodel)
pred = predict(tmodel, test)
foo = data.frame(pred, test)
table(foo$cat, foo$pred)
##
## low not.low
## low 3 2
## not.low 3 30
Supplementary Figure 5: Random forest analysis also identifies dryness of lips and impacts on quality of life as the most discriminant features. Random forest analysis was used to identify the most discriminant features included in the Visual Analog Scale. The out of box error rate for the random forest model was 8.46%. The most discriminant features were the impact of dry mouth on the overall quality of life (phq_dry_qol), the dryness of the lips (phq_dry_lips) consistent with the classification tree.
library(randomForest)
library(rfPermute)
fit <- randomForest(cat ~ phq_dry_swallow + phq_dry_qol + phq_dry_speak + phq_dry_throat + phq_dry_tongue + phq_dry_lips + phq_dry_palate + cohort, data=quest.df, proximity = TRUE)
print(fit) # view results
##
## Call:
## randomForest(formula = cat ~ phq_dry_swallow + phq_dry_qol + phq_dry_speak + phq_dry_throat + phq_dry_tongue + phq_dry_lips + phq_dry_palate + cohort, data = quest.df, proximity = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 8.46%
## Confusion matrix:
## low not.low class.error
## low 11 6 0.35294118
## not.low 5 108 0.04424779
importance(fit) # importance of each predictor
## MeanDecreaseGini
## phq_dry_swallow 2.630653
## phq_dry_qol 4.765104
## phq_dry_speak 3.450197
## phq_dry_throat 1.754508
## phq_dry_tongue 4.178521
## phq_dry_lips 4.300532
## phq_dry_palate 3.558803
## cohort 3.523465
imp = data.frame(caret::varImp(fit))
imp$factor = rownames(imp)
imp = imp[order(-imp$Overall),]
ordering = unique(imp$factor)
imp$factor = as.factor(as.character(imp$factor))
imp$factor <- factor(imp$factor, levels = ordering)
p = ggplot(imp, aes(factor, Overall)) + geom_point() + coord_flip() + theme_classic() +
xlab("Overall")
p
ggsave(p, file="~/Dropbox/oral-clinical-data-analysis-ms/FigureS5.png", device="png", height = 6, width = 8)